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Abstract 

A recent Science paper |T| proposed that humans can 
discriminate between at least a trillion olfactory stimuli. 
Here we show that this claim is the result of a fragile esti¬ 
mation framework capable of producing nearly any result 
from the reported data, including values tens of orders of 
magnitude larger or smaller than the one originally re¬ 
ported in (T] . We conclude that there is no evidence for 
the original claim. 

1 Introduction 

A recent paper ID proposed that humans can discrimi¬ 
nate between at least a trillion olfactory stimuli. Using 
that paper’s methods to reanalyze the data it presented, 
we show that this estimate is problematically fragile. 
Specifically, it varies systematically and sensitively (over 
tens of orders of magnitude, in both directions), for mod¬ 
est changes in incidental experimental and analysis pa¬ 
rameters against which a result ought to be robust. Had 
the experiment enlisted ~ 100 additional subjects sim¬ 
ilar to the original ones, the same analysis would have 
concluded that all possible stimuli are discriminable (i.e. 
that each of the more than 10 29 olfactory stimuli possi¬ 
ble in their framework are mutually discriminable). By 
contrast, if the same experimental data were analyzed us¬ 
ing moderately more conservative statistical criteria, it 
would have concluded that there are fewer than 5,000 
discriminable olfactory stimuli - no larger than the folk 
wisdom value that the new estimate purports to replace. 

As a result, data describing the same underlying per¬ 
ceptual abilities admit a wide range of extremely dis¬ 
parate, yet unobjectionable alternative conclusions (in¬ 
cluding both the largest and smallest possible estimates 
allowed by the analysis framework). We conclude that 
the framework is unsound; therefore there may be tril¬ 
lions of discriminable odor stimuli, or more, or fewer, but 
the framework is incapable of settling this issue. Here we 
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first demonstrate the framework’s fragility, and then ex¬ 
plain its origin. For most of this paper, we remain agnos¬ 
tic about whether the framework is conceptually sound, 
to highlight the fact that it has strictly methodological 
problems of a statistical origin that do not depend on the 
validity of a competing set of assumptions. In a conclud¬ 
ing section, we explore possibilities for improving the 
estimate. 



Figure 1: Consistency of an estimator. An estimator is consistent if the 
resulting estimate asymptotically converges (in expectation) as sample 
size increases (black line). Uncertainty in the estimate (gray area) may 
shrink with sample size, but the estimate itself should not systemati¬ 
cally change with sample size, and should converge on the truth. Es¬ 
timators without this property are termed inconsistent (the blue line is 
a relevant example), and are considered unreliable, as the resulting es¬ 
timate can be heavily biased by the sample size. If the estimate has a 
minimum and maximum allowed value (see equation [7), an especially 
inconsistent estimator can even produce any estimate within that range. 


2 Problems with the estimate 

The main concern is that the estimated number of dis¬ 
criminable stimuli depends steeply, systematically, and 







non-asymptotically on choices of arbitrary experimental 
parameters, among them the number of subjects enrolled, 
the number of discrimination tests performed, and the 
threshold for statistical significance. We show below that 
the order of magnitude claim of ‘at least one trillion ol¬ 
factory stimuli’ requires that those parameters assume a 
very narrow set of values. Certainly, the precise value 
of an estimate may change as additional data are col¬ 
lected, but the estimate should not change in expectation; 
it should not be possible to make an estimate arbitrar¬ 
ily large (or small), simply by collecting more (or less) 
data. Similarly, the estimate itself should not become 
arbitrarily small or large with adjustment of a signifi¬ 
cance criterion. Estimates that scale systematically with 
such incidental parameter choices are considered statis¬ 
tically inconsistent (FigureQJ. It is the inconsistency of 
the present estimate that produces a tremendously large 
space of extremely different, yet unobjectionable alter¬ 
native conclusions that can be reached about the number 
of discriminable olfactory stimuli. 

To illustrate that we can correctly recapitulate the anal¬ 
ysis undertaken in (TJ, Figure U shows our reproduction 
(using raw supplementary data) of two critical figures 
from that paper (T), from which its main conclusion was 
drawn. Figure 0 and Table [2] quantify the fragility of 
this conclusion, by generating estimates using the same 
framework under trivial alternative scenarios in which 
different numbers of subjects (or mixtures) were used, 
or different choices of statistical threshold (a) were used 
for assessing discriminability. See Table[j]for definitions 
of parameters used here and in jT|. Thus, we produced all 
values shown here by analyzing the data from JU, using 
the methods described therein, and varying only parame¬ 
ters. Code for these and all subsequent analyses are avail¬ 
able at http://github.com/rgerkin/trillion. 

Table 1: Definitions of parameters 


z 

Estimated number of discriminable olfactory stimuli 

C 

Number of distinct compounds available to make mix¬ 
tures 

N 

Number of distinct compounds in a mixture 

O 

Number of distinct compounds shared by a mixture pair 

D 

Number of distinct compounds in one mixture of a pair 
that are not shared by the other. (D = N — O) 

class 

All mixture pairs with the same value of N and D. 

d 

The value of D for which mixture pairs of a given N are 
more likely than not to be discriminable at a rate signifi¬ 
cantly above chance. 


In 0|’s experimental framework, there are three sets 
of experiments, varying in the number of distinct molec¬ 
ular components N per mixture tested. We consider 
the N = 30 case (without loss of generality) for which 
there are ~ 10 29 possible olfactory stimuli, and for which 


A 



% mixture overlap (( D-N)/N) 
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% mixture overlap allowing 
discrimination((d-A/j/N) 

Figure 2: Reproduction of the main result published in fT], from anal¬ 
ysis of raw data made available in supplemental materials of {Tj. Com¬ 
pare to Figures 3 and 4 in that publication. A: Discriminability vs. Mix¬ 
ture overlap, expressed as a percentage of the mixture size N. From this 
analysis, (T) derives ~ 51% (vertical dashed line) as the critical 
value of mixture overlap at which 50% of mixtures achieve ‘significant 
discriminability’. B: Estimated number of discriminable mixtures z vs. 
mixture overlap (expressed as a percentage of A) allowing discrimina¬ 
tion. The plot is obtained by regression and interpolation of results like 
in A combined with equadon[T| For a value of ~ 51% as obtained in A, 
one obtains the 'trillions' figure reported in £lj. 


the smallest possible number of discriminable stimuli is 
~ 4500 (see equationQ] in section[3]below). Figure[3]and 
Table |2]thus demonstrate that 1) there is a regime of rea¬ 
sonable parameter choices for which one concludes that 
all possible olfactory stimuli (i.e. all ~ 10 29 of them) are 
discriminable; and 2) there is another regime of reason¬ 
able parameter choices for which one concludes that the 
smallest possible number of stimuli (i.e. only ~ 4500) 
are discriminable. The only assumption required to ob¬ 
tain these estimates is that performance in new subjects 
is similar to performance in the original subjects. 

The fragility of the conclusion results from the claim 
in ||U that a modest (if very interesting) correlation - 
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Figure 3: The estimation framework supports nearly any alternative conclusion, including the smallest and largest estimates possible under the 
framework. A: Heat map showing alternative conclusions reached for different choices of T, the number of mixture pairs per class to test, and 
application of alternative significance threshold a for discriminability, with the data from (TJ. Asterisks (*) show the parameter regime (T — 20 
mixtures, a — 0.05) used in (T). Other values on each axis are chosen in a geometric progression around those parameters. The contour in 
the lower right labeled ‘AH’ demarcates a regime in which one will conclude that the largest possible number of of mixture stimuli (i.e. all 
z(d — 0) = ( 3 Q 8 ) > 10 29 of them) are discriminable (see equation [7}. The contour in the upper left labeled ‘smallest possible’ demarcates a regime 
in which one will conclude that the smallest possible number of stimuli are discriminable, i.e. only z{d — N — 30) < 5000 of them. The contour 
labeled ‘colors’ demarcates a regime in which one concludes that the number of discriminable olfactory stimuli is the same order of magnitude 
as the number of discriminable colors. B: Heat map similar to left, only with number of subjects on the vertical axis. A choice of a = 0.025 is 
necessary to obtain the estimate that Q] reports for this analysis. C: Colorscale for A and B, with reference landmarks. 
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A 


A 


H D>d 


# Discriminable 
stimuli (z) 

Significance 
threshold (a) 

# Tests per 
class ( T) 

2.02 *10 12 

0.05* 

20* 

4.56 * 10 3 f 

0.05* 

5 

1.54 * 10 29 } 

0.05* 

185 

8.94* 10 3 

0.001 

20* 

1.79 * 10 4 

0.01 

15 


B 


# Discriminable 
stimuli (z) 

Significance 
threshold (a) 

# Subjects 
(5) 

3.81 *10 13 

0.025* 

26* 

4.56 * 10 3 f 

0.025* 

7 

1.54 * 10 29 } 

0.025* 

135 

3.47 *10 7 

0.001 

26* 

2.98* 10 s 

0.01 

15 


Table 2: Estimates of z, the number of discriminable olfactory stimuli, 
for different possible parameters values, for the C = 128, N = 30 case 
used in [TJ- This recapitulates selected points from Figure]!] * indicates 
that the parameter value was used in Q]. We assume here that new 
subjects perform similarly to the original subjects. Note that 4.56 * 
10 3 (f) and 1.54* 10 39 (±) are the smallest and largest possible values 
allowed by the framework from HI . 

between the discriminability of a pair of mixtures and 
the overlap (fraction of shared components) of those 
mixtures - is evidence that a particular degree of mix¬ 
ture overlap defines a boundary that partitions the dis¬ 
criminable from the indiscriminable in a very high¬ 
dimensional space. Below, we explore the consequences 
of this decision, and its implications for calculating the 
number of discriminable olfactory stimuli. 

3 Explanation of the problems with the es¬ 
timate 

3.1 Recap of the basic framework 

The framework’s logic is built on an analogy to color vi¬ 
sion, where estimating the number of discriminable col¬ 
ors requires knowing only two numbers: the size of the 
stimulus space (that is, the range of visible wavelengths), 
and the minimally discriminable distance between a typ¬ 
ical pair of stimuli (Fig. ED- Dividing the first number by 
the second amounts to asking how many discriminable 
intervals can be ‘packed’ into the stimulus space, with 
that number providing an estimate of the number of dis¬ 
criminable color stimuli. 
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Figure 4: ‘Sphere packing’ to estimate the number of discriminable 
colors: the motivation behind the framework in [TJ. A: Hypothet¬ 
ical example showing a range of visible wavelengths. Relative to a 
reference stimulus (thick vertical tick mark), extremely distant stimuli 
(green circle) in this space are easy to discriminate, whereas extremely 
close stimuli (red circle) may be impossible to discriminate, as they are 
beyond the resolution of color vision. At some critical inter-stimulus 
distance, d, stimuli will be ‘just discriminable’ (black circle). A typ¬ 
ical stimulus pair on the space, separated by distance D, will tend to 
be discriminable if D > d, and indiscriminable if D < d. B: This par¬ 
titioning into discriminable and indiscriminable sets is captured in the 
sigmoidal shape of the psychometric curve plotting discriminability vs. 
distance. Knowing that an interval of length d on the space will tend 
to span ‘just discriminable’ stimuli, one can calculate how many such 
intervals, z, can be ‘packed’ onto the space to estimate the number of 
discriminable colors. 


Because olfactory stimuli do not have obvious physi¬ 
cal dimensions analogous to wavelength, olfaction is not 
amenable to an identical calculation. Instead, JT] estab¬ 
lished a theoretical framework that yielded a similar cal¬ 
culation based upon the same underlying idea. 12 pro¬ 
posed to divide the size of a investigator-determined ol¬ 
factory stimulus space by a data-determined variable rep¬ 
resenting resolution in this space. Instead of being con¬ 
tinuous, one dimensional, and defined by some intrinsic 
stimulus variable like wavelength, the olfactory stimulus 
space was defined to be the discrete, high-dimensional 
space spanned by all mixtures containing N = 30 dif¬ 
ferent components (molecules) that could be assembled 
from a library of C = 128 molecules; fl] also considers 
the N = 10 and N = 20 cases, which we ignore in this 
section with no loss of generality. This space of possi¬ 
ble mixture stimuli is astronomically large (('), owing to 
the proverbial ‘combinatorial explosion’, and each point 
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in the space corresponds to a specific multi-component 
mixture. 

One definition of distance between stimuli in this 
space is the number of components D by which the stim¬ 
uli differ. For example, nearest neighbors would be stim¬ 
uli sharing all components but one (D = 1), and the most 
distant points in this space would be stimuli differing in 
all components (D = N). 

iffl showed that discriminability of a stimulus pair 
tends to increase with the distance D between the stimuli 
in that pair (Figure [2}\), and then argued for the exis¬ 
tence of a special distance d corresponding to the D at 
which stimuli are ‘just discriminable’. In other words, 
for D > d stimuli should more often than not be consid¬ 
ered discriminable and for D < d they should more often 
than not be considered indiscriminable. By calculating d, 
one could in turn readily calculate the number of stimuli 
within a distance D < d of a typical point in the stimulus 
space using the provided formulas. Geometrically, the 
set of stimuli with distance D < d from a reference stim¬ 
ulus corresponds to a filled ‘ball’ of stimuli indiscrim¬ 
inable from the reference stimulus at its center. Con¬ 
versely, the reference stimulus should be discriminable 
from stimuli outside the ball. We could thus count the 
number z of non-overlapping balls that can be packed 
into the stimulus space, as the proposed in 11], by anal¬ 
ogy to the example for color vision: 


z{d) 


Q 

bailed/ 2) 


where ‘ball’ is defined as: 


( 1 ) 


Equation [T] produces the final estimate z of the num¬ 
ber of discriminable stimuli. C and N are fixed by ex¬ 
perimenter choices, and d - the resolution-like term - 
is the only quantity derived from data that is related to 
measured psychophysical performance. Note that for 
C = 128, N = 30, as used in III , the largest and smallest 
possible values this equation can produce are ~ 1.5 * 10 29 
(for d = 0) and ~ 4500 (for d = N), respectively. Assum¬ 
ing this framework is conceptually unproblematic (but 
see Q), the only question becomes: How do we derive 
d from the data? 
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Figure 5: Behavior of psychometric curves for hypothetical data de¬ 
scribing discriminability vs. inter-stimulus distance. A: Left, A sharply 
sigmoidal relationship in which discriminability changes dramatically 
and categorically at a critical inter-stimulus distance, d. In all panels, d 
is the value of the inter-stimulus distance D at which a threshold frac¬ 
tion of stimulus pairs are discriminable. In the left panels, this threshold 
is set at 0.5. Right, The resulting value of d is nearly invariant to the 
choice of threshold . B: Same as above, only for a less sharply sig¬ 
moidal data set. There is still a narrow regime in which d is largely 
invariant to choice of threshold. C: Same as above, only for a weakly 
sigmoidal data set. Here, there is no principled means for choosing the 
d that is characteristic of discriminability relationships for stimuli. The 
data in C do not support an interpretation in which there is defensible 
characteristic ‘length scale’ for inter-stimulus distances. 


3.2 Derivation of the critical parameter d 

3.2.1 Thresholding the fraction discriminated 

A classic psychometric curve (Figure |4j3), showing dis¬ 
criminability as a function of inter-stimulus distance D, 
admits a few plausible ways to derive d. The sim¬ 
plest is to simply use a discriminability threshold, such 


that d corresponds to the distance D at which the ‘frac¬ 
tion correct’ reaches a certain value. In JTJ ’s three- 
alternative forced-choice experiments, chance respond¬ 
ing would produce a fraction correct of so the appro¬ 
priate threshold would be somewhere between -j and 1. 
This threshold choice would be arbitrary - we might say 
that a fraction correct of j reflects discriminable, or alter- 
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natively we might choose | or any other value between 
■j and 1. 

If the psychometric curve is sufficiently steep near 
some value of D (Figure [5}\ represents an ideal case) 
then the derived d will vary minimally over a wide range 
of choices for the threshold. In this scenario, we might 
be confident that the d we derive is a truly meaningful 
measure of resolution - it would be robust. If not (Figure 
Et), it will be very fragile. To test whether this scenario 
applies to the data from 12, we plotted the fraction dis¬ 
criminated vs percent mixture overlap for that data (Fig¬ 
ure [6]», and then varied the threshold from | to 1, using 
regression and interpolation to obtain the distance d cor¬ 
responding to this threshold (Figure[7] thick red line), by 
analogy to the framework in [ 1 j. 



Figure 6: Discriminability vs mixture overlap. Analogous to Figure[2] 
except plotting fraction discriminated directly (as in Figure |5), instead 
of fraction significantly discriminate. The threshold (50%) and the 
procedure for computing mixture overlap at that threshold are as in 
Figure|2j\. Derived from data in fT| as f° r Figure 2. 

We subsequently used those values to compute the cor¬ 
responding number of discriminable stimuli z. (Figure [7J 
black curve). These results show that neither the esti¬ 
mates of d nor by extension that of z are robust across 
this range of thresholds, so it is impossible to report 
with any confidence the number of discriminable stim¬ 
uli using this approach. Intuitively, this ‘choose your fa¬ 
vorite threshold’ strategy is problematic, as it effectively 
amounts to picking a target number between ~ 10 3 and 
~ 10 29 . Below, we show that the actual framework used 
in m is nominally employed to make a more principled 
choice of threshold; however it merely cloaks the arbi¬ 
trariness of the threshold choice, but does not eliminate 
it. 

3.2.2 Thresholding the fraction significantly dis¬ 
criminable 

A variation on the above approach, and the one used in 
ifTl , is to apply a threshold not to the fraction discrimi- 
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all possible olfactory stimuli (C choose N) 



threshold defining discriminability 
(% correctly discriminated) 


all possible olfactory stimuli (C choose N ) 



threshold defining discriminability 
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Figure 7: Relationship between the estimated number of discrim¬ 
inable stimuli z and the choice of threshold defining discriminability. 
A) The thick red line shows the critical distance d that would result 
from the data in (T) for a range of ‘fraction discriminated’ thresholds 
between 100% (perfect discrimination), and 33.3% (chance discrim¬ 
ination). The curve was obtained by regression on plots like that in 
Figure [6] by analogy to Figure [2] and [I]. Note that d exhibits a nearly 
constant-slope relationship with threshold, meaning the data are not de¬ 
fined by a characteristic length scale, much like in Figure[5}3. The thick 
black curve shows the relationship between z and the chosen threshold. 
This relationship was obtained directly from d, using equation |T] as in 
fll . The thin red lines correspond to the same calculation for d but 
using data for only a single subject (one per line), showing similar sen¬ 
sitivity to the choice of threshold. The absence of a robust d for any 
individual subject argues that the group data are not simply explained 
by averaging across a population with well-defined, but diverse values 
of d. Note that very modest and reasonable alternative choices for the 
threshold result in extremely disparate estimates. The vertical axis is 
bounded by the smallest and largest possible number of discriminable 
stimuli allowed by the framework. The dashed lines are a visual guide 
to specific (threshold, z) pairs. B) Box and whisker plots showing the 
median and inter-quartile range for z when restricting the analysis to 
individual subjects. Note that the worst performing subjects under one 
threshold can discriminate many more stimuli than the best performing 
subjects under a slightly more liberal threshold (compare best subject 
using a 60% threshold vs. worst subject using a 40% threshold). 
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noted, but to the fraction significantly discriminate. In 
other words, determine for which subjects (or alterna¬ 
tively, for which classes of mixtures) the fraction dis¬ 
criminated is significantly greater than i.e. for which 
subjects the null hypothesis of chance discrimination 
can be rejected. To facilitate visualization of this step, 
ID re-plotted the summary data (fraction correctly dis¬ 
criminated) as fraction significantly discriminable (Fig¬ 
ure EK). This view of the data again provides a linear 
relationship between distance D and, in this case, frac¬ 
tion significantly discriminable, which holds across all 
the values of N tested. The relationship is now steeper 
than it was for fraction discriminable (compare Figures 
[2]and[6]) because the extra hypothesis-testing step acts as 
a strong non-linear threshold that exaggerates otherwise 
small differences in the data. Again a choice of threshold 
choice is required; 11 chose a threshold of 50% signif¬ 
icantly discriminable, and computed d using linear re¬ 
gression and interpolation as above. 

Because the linear relationship between the distance 
D and the fraction significantly discriminable (Figure H 
is steeper than it is for D and the fraction discriminated 
(Figure |6j, the former would appear to be less fragile 
than the latter. Indeed, varying the threshold (i.e. 50%) 
itself (not shown), will have much less effect on the com¬ 
puted d (and consequently on z) for the former than for 
the latter. However, by introducing a hypothesis-testing 
step, the d derived from Figure [2] now varies systemat¬ 
ically with the number of subjects enrolled in the study 
(and the number of mixtures tested), and with the choice 
of significance criterion a. This is because each data 
point used to compute d becomes the binary result of a 
hypothesis test, each of which depends critically on sam¬ 
ple size and test specificity. Because d is then fed into an 
expression (equation Q3 that explodes geometrically, the 
result is a recipe for producing any of a range of estimates 
for z that one might choose. If one enlists more sub¬ 
jects or slackens the significance criterion, a very large 
(even the largest possible) number will be obtained. If 
one enlists fewer subjects or makes the significance crite¬ 
rion more strict, a very small (even the smallest possible) 
number will be obtained. Figure[8]shows the explicit de¬ 
pendence of the estimate on these quantities. Naturally, 
these can be varied in tandem too, with even more dra¬ 
matic consequences, as described above (Figure [3] and 
Table O. 

A hypothesis test is meant to assess the strength of ev¬ 
idence for or against a hypothesis (often against a null 
hypothesis), not to make a point estimate. However, it 
may not be uncommon for researchers to use hypothesis 
testing in the manner done in ID - to count the number 
or fraction of data points exhibiting a certain property. 
In many cases this may amount to a venial statistical sin 
with (hopefully) benign consequences. But that is unfor¬ 


tunately not the case here, due in part to the extremely 
steep dependence of z on d guaranteed by equation Q] 

If one claims an estimate to be meaningful, it is fair 
to ask how vigorously would one have to defend a spe¬ 
cific choice of arbitrary experimental parameters to de¬ 
fend a particular order-of-magnitude range around that 
estimate. Unfortunately, the systematic sensitivities ex¬ 
hibited here severely undermine the plausibility and rel¬ 
evance of the estimate reported in ID. Due to these sen¬ 
sitivities, one could pick almost any number of discrim¬ 
inable stimuli in advance, and affirm this number using 
these or similar data. jT| simply exchanged the arbitrari¬ 
ness of a ‘fraction discriminated’ threshold with the ar¬ 
bitrariness of the sample size and a. Ultimately, the ab¬ 
sence of a robust d to characterize the data is an insur¬ 
mountable obstacle for the framework. 

4 Building the the stimulus space 
4.1 The structure of the stimulus space 

One might ask: what is the right way to calculate d in 
order to obtain a robust estimate of the number of dis¬ 
criminable stimuli? Before heading down this road and 
devising alternative statistical approaches, it is worth first 
clearly articulating the assumptions of a framework in 
which a single variable plays such a special role. Un¬ 
der what conditions is it sensible to expect that plugging 
a single data-derived number into equation [T| will pro¬ 
duce a meaningful lower bound of the number of dis¬ 
criminable olfactory stimuli? 

To gain some intuition into this, we can ask the anal¬ 
ogous question in the simplified visual system example 
(Figure [4]> that was used as the principal motivation for 
the procedure. The ‘sphere packing’ calculation in this 
case naturally involves measuring the resolution of per¬ 
ception in terms of the stimulus, but its validity is not a 
consequence of this measurement alone. Rather, the pro¬ 
cedure in Figure [4] is sensible because the thing we are 
calling an independent stimulus dimension (wavelength) 
is respected as such by perception: we encounter mono- 
tonically changing, non-redundant percepts as we move 
from one extreme of the stimulus space to the other. If 
we didn’t - say, if the same percept ‘blue’ were experi¬ 
enced for several non-overlapping disjoint intervals - the 
sphere packing formulation would fall apart. We might 
observe that on average discriminability improves with 
distance, but this would not be evidence of a character¬ 
istic length scale that partitions stimulus pairs into dis¬ 
criminable vs. indiscriminable sets. 

Thus the sphere-packing framework is valid only if the 
underlying geometry of stimulus space (that the inves¬ 
tigator has designed) aligns with the geometry of per¬ 
ceptual space (as implemented in neural circuitry). For- 
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Figure 8: Steep, systematic, and non-asymptotic dependence of the estimate on sample size (S or T ) and threshold a for statistical significance. A) 
Dependence of the estimate (for mixtures of N = 30) on sample size. Black shows dependence on the number of subjects S enrolled in the study. 
Red shows dependence on the number of mixtures T tested per mixture class. Once the number of mixtures or subjects tested is ~ 150 (by no means 
an unusually large sample size), the conclusion that all possible (jj) mixtures are discriminable is guaranteed, in contradiction with experimental 
results. B) Dependence of the estimate on the significance threshold a with (red) and without (black) a correction for multiple comparisons. Q] did 
not correct for multiple comparisons. 


mally, the map from stimulus space to perceptual space 
needs to be homeomorphic, or nearly so. See [6j for fur¬ 
ther insight on this issue. 

4.2 Redundancy in the stimulus space 

Instead of providing evidence for this homeomorphism, 
fTl assumed for the purposes of calculation that each 
component of the molecular library (of size C = 128 
in HI) spanned an informative additional dimension for 
perception to explore: each molecule in the library is 
treated as an olfactory primary that is independent of all 
the others. This is the assumption, codified in the nu¬ 
merator of equation Q] that allows for a massive space of 
potential discriminable stimuli. Indeed, the guaranteed 
runaway growth of the numerator as molecules are added 
to the C-sized library was offered in |Q] as an argument 
for why the reported ‘trillion’ figure is a lower bound - 
after all, C could always be higher. 

It is worthwhile to quantify the behavior of the esti¬ 
mate as C changes. First, the estimate depends geomet¬ 
rically on C, with a power law exponent of ~ 30 (Fig. |9j 
blue line). In other words, if the chemical library were 
doubled, the estimate z would increase by a factor of 2 30 
under constant performance. If the component library 


were increased to the size of a standard flavor and fra¬ 
grance catalog (~ 2000 chemicals), the estimate would 
increase to z ~ 10 41 , implying a unique olfactory percept 
for each carbon atom on earth. 

Subjects’ performance could become worse when 
mixtures are drawn from this larger, more complete li¬ 
brary, and we acknowledge that we cannot know in ad¬ 
vance what the newly calculated resolution d would be 
on the new stimulus space. In other words, as the numer¬ 
ator of equation Q] increased, its denominator (given by 
equation [2]! might conveniently grow proportionally. Let 
us therefore assume that with a library of sufficient size, 
so many mixtures become indiscriminable that the reso¬ 
lution becomes as poor as the framework allows, with 
d = N. Even in this edge case, if only mixtures dif¬ 
fering in all components were ‘just discriminable’, we 
would still calculate 10 21 discriminable stimuli. If C is 
increased to 10 6 , the smallest possible number of dis¬ 
criminable percepts (under the assumption of worst mea¬ 
surable performance, as above) is 10 61 , or ten million 
trillion unique olfactory percepts for every carbon atom 
on earth (Fig. [9j red line). One may object that the in¬ 
flation of C here is an unfair critique, as the perceptual 
redundancy of molecules must at some point provide an 
important constraint on the size of the artificially con- 
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Figure 9: Explosive growth of the estimate z on the size (C) of the 
molecular library. The number of possible stimuli z that can be assem¬ 
bled by choosing N — 30 distinct molecules from a library of size C in¬ 
creases geometrically with C (black line). If a library of a different size 
had been used, and similar subject performance resulted, the estimated 
number of discriminable stimuli z would grow along a similar trajectory 
(blue line). Even if performance deteriorated as C increased, the esti¬ 
mate could never fall below the red line, which represents worst-case 
performance (d — N). This results from the combinatorial explosion 
inherent in equation |T| 


structed stimulus space. Indeed, it has been reported that 
as few as thirty components are required to imbue most 
mixtures with a common smell, even when there is no 
component overlap between the mixtures ca. But this is 
the essence of the problem with equation Q] where does 
that point lie, and why wasn’t the constraint important to 
consider for the original C = 128 molecular library? 


of the underlying perceptual (or conceptual) representa¬ 
tions of those stimuli. These maps are characterized by 
the attribute that pairs of items which are considered in¬ 
tuitively to be perceptually near (rated similar or difficult 
to discriminate) are nearer to one another on the map than 
pairs of items which are perceptually more distant (rated 
dissimilar or easy to discriminate). There are many al¬ 
gorithms for generating such maps, many of which have 
been used before in olfaction, including variants of PCA 
mm®, non-negative matrix factorization (NMF, 0), 
and multi-dimensional scaling 0. While there are open 
questions in the generation of these maps (e.g. how many 
dimensions should they have?), they all have the virtue 
that their accuracy can be checked (e.g. by examining 
the correlation between subjects’ indications of item pair 
dissimilarity and the distance between that pair on the 
map), and thus the maps can be improved. Developing 
these maps may also have the collateral benefit of reveal¬ 
ing stimulus dimensions intrinsic to olfaction (if any), 
which could constrain the experimental choice of a reso¬ 
lution to measure. 

Unfortunately, it is difficult if not impossible to create 
these maps from the data discussed here, because each 
mixture of a tested pair is used only once in JT), in that 
pair alone, and never in any other pairs. Thus, there are 
no serial comparisons of the same mixture that could be 
used to anchor a stimulus on the map relative to any¬ 
thing other than that one stimulus against which it was 
directly compared experimentally. Thus, there is no way 
to compute distances between stimuli that do not appear 
together in a tested pair. In future experiments such serial 
repetition of already-tested mixtures would be required 
to build up a data set to which the proposed method could 
be applied. 

6 Appendix 


5 Avenues for improving the estimate 

If one is seeking a conservative estimate of the number 
of discriminable stimuli in a perceptual space whose or¬ 
ganization and intrinsic dimensionality are poorly under¬ 
stood, it is arguably more appropriate to use a model that 
accounts for the data with the smallest number of dimen¬ 
sions. The massive estimates possible in the framework 
are an immediate consequence of a definition of dimen¬ 
sionality driven by experimenter designation, not data. 

We therefore propose an alternative framework: use 
experimental data to create a working map of the per¬ 
ceptual space, and then apply the sphere-packing frame¬ 
work to that map, rather than to a map of the stimulus 
space. In cognitive science, psychometrics, and market¬ 
ing, subject responses to stimuli are used to create maps 


Here, we provide a more detailed statistical argument de¬ 
scribing the framework’s extreme sensitivity to inciden¬ 
tal parameters. The crux of the statistical issue is this: the 
framework could only be valid if d, the estimated differ¬ 
ence limen used in the calculation step, is a measure of 
olfactory resolution that converges to the true value of 
this quantity as more data is collected, i.e. if it is consis¬ 
tent. 

‘Significantly discriminable’ is a moving target de¬ 
pendent on sample size, choice of significance criterion, 
and correction for multiple comparisons. And d is the 
only data-dependent value used in subsequent calcula¬ 
tions (equationQ}, Together, this guarantees that the esti¬ 
mate of z in the Q] is a moving target as well, dependent 
on these same parameters, d is generated by testing a 
number of null hypotheses, and is closely related to the 
fraction of these which are rejected. But the probability 
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of and criteria for rejection of these null hypotheses de¬ 
pends critically on sample size and a, the values that we 
explored in Figure [3] and Table [2] Certainly, we would 
agree that there is nothing objectionable about the spe¬ 
cific parameters chosen in Q]]. However, there is nothing 
objectionable about many other values for those parame¬ 
ters either. 

In effect, calculating d is somewhat like judging 
whether a coin meets a cutoff for being fair based on a 
series of tosses. It matters very much how many tosses 
one makes, and how much deviation from chance one is 
willing to tolerate before calling a coin unfair. If you 
have no particular reason to believe a coin is unfair, you 
might be disinclined to call it unfair if you observe 
(60%) heads, but probably not if you observed heads 
(also 60%). However, if you own a casino, you might 
call 5100 heads in 10000 (51%) evidence of an unfair 
coin. Whether the coin is fair is not something we di¬ 
rectly measure, but rather we have more or less evidence 
for various degrees of fairness. 

A similar situation applies in JT|’s analysis by consid¬ 
ering its formal definition of d (a definition we verified 
by reconstructing the critical figures from Jl] in Figure^ 
d is defined as that inter-stimulus distance D for which 
50% of subjects can significantly discriminate a mixture 
class. By a mixture ‘class’ we denote the set of mixture 
pairs for which each mixture has the same number of to¬ 
tal components ( N ) and each pair has the same number 
of distinct, non-overlapping components D (D = N — O, 
see Table |T}. For example, the mixture pair (ABC, ABD ) 
would be a member of the class with N = 3 and D = 1 
distinct components. We focus here on calculations per¬ 
taining to the number of tests T per class, but the same 
argument is readily translated over to the number of sub¬ 
jects S. 

To assess significant discriminability from chance, m 
used a two-tailed binomial test. Thus if a p-value is 
smaller than ^ then the subject is considered able to sig¬ 
nificantly discriminate from pairs in the mixture class. 
The p-value is given by 1 minus the cumulative binomial 
distribution function for n = T trials, k successes, and a 
probability of success equal to j, with k corresponding to 
the number of subjects discriminating correctly, and | to 
chance in a 3-way forced choice task. Thus, the subject’s 
discrimination performance is significant if: 

a/2> 1 -cdf binomia ,(T 1 k 1 \) = Y J (^)‘(^) r “‘ 

(3) 

For a = 0.05, T = 20 (as used in JTj), this inequality 
is satisfied for k >= 11. For each subject, k might be any 
value between 0 and 20 depending on olfactory acuity. If 
k >= 11 for more than 50% of subjects, then the value 


of D characterizing that mixture pair is necessarily > d. 
If k >= 11 for fewer than 50% of subjects, then D < 
d. If k >= 11 for exactly 50% of subjects, then D = d. 
The actual estimate for d is obtained by regression in the 
spirit of Figure [2] 

What kind of subject can discriminate successfully 11 
times out of 20? Consider a mixture class X N D (char¬ 
acterized by N and D ), and a subject performance of 
fN,Di corresponding to the proportion of mixtures cor¬ 
rectly discriminated from a sample of size T. Note that 
/a j d is simply the abscissa of Figure 1 from m. A sub¬ 
ject with /n,d = 0.55 would get k = T * /n,d = 11 out 
of T = 20 correct on average. So we can rewrite the in¬ 
equality above as an equation: 


fN,D * t / r T 1 

1 - a/2 = £ 

i—o 


(?)'(§)-' 


(4) 


If the above equation is satisfied, then the subject will 
be considered to be on the boundary between signifi¬ 
cantly discriminating and not significantly discriminat¬ 
ing mixture pairs in the class. If half of subjects perform 
better than fy.ii, and half less, then half of subjects will 
be considered to significantly discriminate mixture pairs 
in the class (and half not), and so d will be set equal to 
D. This is simply the definition of d. 

The value fy } D for which that equation is satisfied de¬ 
pends upon a and T . fyj) is related to N and D through 
the data, and so the value of D for which the equation is 
satisfied (i.e. D = d) depends upon a, T, and the data. 
However, it is inappropriate for the discriminability li- 
men to depend on a and T in this way. As we showed 
above, this has serious consequences for the estimate of 
d, and therefore also for the estimate of z. It is what 
makes z inconsistent. 

Figure [TO] shows the relationship between the critical 
fN,D, T, and a. Note that this relationship is indepen¬ 
dent of the data. The data only determine how fy j) de¬ 
pends upon D and consequently determines z. In sum¬ 
mary, a smaller (larger) value of a or T requires a much 
higher (lower) value of fy } D to satisfy the equation. This 
higher (lower) value of fyp might only be found at a 
much larger (smaller) value of D, implying a much larger 
(smaller) value of d and therefore a much smaller (larger) 
value of z. 

With a sufficient number of subjects (or tests), even 
barely above chance performance can produce estimates 
of z equal to the largest possible number of stimuli (Fig¬ 
ures [T] and [S}. in fact, this is guaranteed by equation [4] 
The critical values of required for statistical signif¬ 
icance will asymptotically approach | (chance) as T ap¬ 
proaches infinity. The same principle applies to a consid¬ 
eration of changes to the number of subjects S, instead of 
the number of tests. This illustrates the core of the prob- 
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Figure 10: Fraction discriminated at which statistical significance is 
reached. For each possible value of the number of tests T conducted 
per mixture class, there is a cumulative distribution of the fraction / of 
those tests that will be correctly discriminated, under the null hypoth¬ 
esis of chance (4) responding. The choice of significance threshold a 
determines the fraction correct required to reject the null hypothesis, 
and thus count as ‘significantly discriminating’ in the framework. For 
a given value of a (0.05 shown here, and used in the fraction cor¬ 
rectly discriminated required to reach this threshold varies greatly with 
T. Rejecting the null hypothesis can thus be very easy or very hard 
depending on T (or the number of subjects S, not shown), or on a. 


lem. Discriminating significantly above chance can be a 
very high bar or a very low bar depending on the param¬ 
eters of the experiments and the analysis, including S, T, 
and a. 
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